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DETAILED ACTION 

1 . This office action is in response to the application 10626984 filed on 7/25/2003. 
Claims 1-53 are pending. 

Specification 

The disclosure is objected to because of the following informalities: Page 14 of 
the specification states "Document A has scores corresponding to concepts 5 and 120 
and Document B has scores corresponding to concepts 3,5 and 120, thus these 
documents only have concept 5 in common. It is believed and examiner will assume for 
the purpose of this examination that the highlighted 120 should read 47, since there is 
no vector for document B corresponding to 120 and that will also satisfy the requirement 
of A and B only have 5 in common. Appropriate correction is required. 

Claim Objections 

Claims 17, 52 are objected to under 37 CFR 1.75(c), as being of improper 
dependent form for failing to further limit the subject matter of a previous claim. 
Applicant is required to cancel the claim(s), or amend the claim(s) to place the claim(s) 
in proper dependent form, or rewrite the claim(s) in independent form. Claims 17,52 
contain essentially the same limitations as claims 9,35 respectively. 
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Claim Rejections - 35 USC § 101 

35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of 
matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the 
conditions and requirements of this title. 

Claims 1- 53 are rejected under 35 U.S.C. 101 because the claimed invention is 
directed to non-statutory subject matter. Claims 1- 53 Claims 1-53 are not limited to 
embodiments that fall into a statutory category. Page 9 lines 17-23 of the specification 
state that each module is a computer program, procedure or module written as source 
code. The various implementations of the source code and object codes can be held on 
computer readable storage medium or embodied on a transmission medium in a carrier 
wave. Carrier waves however, and not an embodiment that is tangle, therefore every 
claim is directed toward subject matter that cannot be perceived as tangible and is 
therefore rejected under U.S.C. §101. 

Claim Rejections - 35 USC §112 

The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

Claims 19-23, 31,36-40 are rejected under 35 U.S.C. 112, second paragraph, as 

being indefinite for failing to particularly point out and distinctly claim the subject matter 

which applicant regards as the invention. Each of the above claims state "substantially 

in accordance with the formula." The word substantially implies that there may possible 

be more terms in the equation, and in the specification the described embodiment is 
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calculated in accordance with the equations not substantially in accordance with the 
equation. Thus claims 19-23, 31, 36-40 have not been treated any further on the merits. 



Claim Rejections - 35 USC § 102 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 1 02 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless -(a) the invention was known or used by others in this 
country, or patented or described in a printed publication in this or a foreign country, before the invention 
thereof by the applicant for a patent. 

Claims 1-17 rejected under 35 U.S.C. 102(a) as being anticipated by Wo 
03060766 (hereinafter Lin) (art of record). 

As for claim 1 Lin discloses: a scoring module determining a score assigned to at 
least one concept extracted from a plurality of documents (See page 17 lines 20-24 
note: definition of document corpus) based on at least one of a frequency of occurrence 
of the at least one concept within at least one such document, a concept weight, a 
structural weight, and a corpus weight; (See page 7 lines 20-24) and a clustering 
module forming clusters of the documents by applying the score for the at least one 
concept to a best fit criterion for each such document (See page 19 lines 4-10). 

As for claim 2 the rejection of claim 1 is incorporated, and further Lin discloses: 
the scoring module calculating the score as a function of a summation of at least one of 
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the frequency of occurrence, the concept weight, the structural weight, and the corpus 
weight of the at least one concept (See Page 23 lines 1-4). 

As for claim 3 the rejection of claim 2 is incorporated, and further Lin discloses: a 
compression module compressing the score through logarithmic compression (See 
page 17 line 30-34). 

As for claim 4 the rejection of claim 1 is incorporated, and further Lin discloses: 
the scoring module calculating the concept weight as a function of a number of terms 
comprising the at least one concept (See page 21 lines 25-28). 

As for claim 5 the rejection of claim 1 is incorporated, and further Lin discloses: 
the scoring module calculating the structural weight as a function of a location of the at 
least one concept within the at least one such document (See page 18 lines 10-14). 

As for claim 6 the rejection of claim 1 is incorporated, and further Lin discloses: 
the scoring module calculating the corpus weight as a function of a reference count of 
the at least one concept over the plurality of documents (See page 18 lines 19- 21 note: 
this is an inverse weight of the reference count). 

As for claim 7 the rejection of claim 1 is incorporated, and further Lin discloses: 
the scoring module forming the score assigned to the at least one concept to a 
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normalized score vector for each such document, determining a similarity between the 
normalized score vector for each such document as an inner product of each 
normalized score vector, and applying the similarity to the best fit criterion (See page 30 
line 30- page 31 line 1). 

As for claim 8 the rejection of claim 1 is incorporated, and further Lin discloses: 
the clustering module evaluating a set of candidate seed documents selected from the 
plurality of documents, identifying a set of seed documents by applying the score for the 
at least one concept to a best fit criterion for each such candidate seed document, and 
basing the best fit criterion on the score of each such seed document (See page 28 line 
8-16 note representative= seed). 

Claims 9-16 are method claims corresponding to system claims 1-8 respectively, 
and are thus rejected for the reasons set forth in the rejection of claims 1-8. 

Claim 17 is rejected for the same reasons as claim 9. 

As for claim 18 Lin discloses: a frequency module determining a frequency of 
occurrence of at least one concept within a document retrieved from the document set 
(See page 18 lines 1-3); ahd a concept weight module analyzing a concept weight 
reflecting a specificity of meaning for the at least one concept within the document (See 
page 25 lines 27-30 note: rtc(t.c) is a value based on meaning); a structural weight 



Application/Control Number: 10/626,984 Page 7 

Art Unit: 2166 

module analyzing a structural weight reflecting a degree of significance based on 
structural location within the document for the at least one concept (See page 18 lines 
8-13), a corpus weight module analyzing a corpus weight inversely weighing a 
reference count of occurrences for the at least one concept within the document (See 
page 18 lines 19- 21 note: this is an inverse weight of the reference count); and a 
scoring module evaluating a score associated with the at least one concept as a 
function of the frequency, concept weight, structural weight, and corpus weight (See 
page 21 24-27). 



Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1 , 148 

USPQ 459 (1966), that are applied for establishing a background for determining 

obviousness under 35 U.S.C. 103(a) are summarized as follows: 

1 . Determining the scope and contents of the prior art. 

2. Ascertaining the differences between the prior art and the claims at issue. 

3. Resolving the level of ordinary skill in the pertinent art. 

4. Considering objective evidence present in the application indicating 
obviousness or nonobviousness. 
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Claim 24 is rejected under 35 U.S.C. 103(a) as being unpatentable over Lin as 
applied to claim 18 above, and further in view of US 6675159 (hereinafter Klein) (art of 
record) 

As for claim 24 the rejection of claim 18 is incorporated, and further Klein 
discloses: a global stop concept vector cache maintaining concepts and terms (See 
column 18 lines 17-20 and See column 14 lines 45-49); and a filtering module filtering 
selection of the at least one concept based on the concepts and terms maintained in the 
global stop concept vector cache (See column 14 lines 45-50). It would have been 
obvious to an artisan of ordinary skill in the pertinent art at the time of the invention to 
have incorporated the teachings of Klein into the system of Lin. The modification would 
have been obvious because queries and documents are linked in the fact that words are 
the entities that are being processed. Therefore, any transformation capable of being 
made to a query should be able to applied to documents too, this makes all document 
management systems more efficient and easier to maintain. 

As for claim 25 the rejection of claim 18 is incorporated, and further Klein 
discloses: a parsing module identifying terms within at least one document in the 
document set, and combining the identified terms into one or more of the concepts (See 
column 2 lines 53-56). 
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As for claim 26 the rejection of claim 25 is incorporated, and further Klein 
discloses: the parsing module structuring each such identified term in the one or more 
concepts into canonical concepts comprising at least one of word root, character case, 
and word ordering (See column 14 lines 63-67). 

As for claim 27 the rejection of claim 25 is incorporated, and further Klein 
discloses wherein at least one of nouns, proper nouns and adjectives are included as 
terms (See column 14 lines 40-44). 

Claims 28-30,32-34 rejected under 35 U.S.C. 103(a) as being unpatentable over 
Lin as applied to claim 18 above, and further in view of US 5794236 (hereinafter 
Mehrle). 

As for claim 28 the rejection of claim 18 is incorporated, and further Mehrle 
discloses a plurality of candidate seed documents (See column 2 lines 42-46), a 
similarity module determining a similarity between each pair of a candidate seed 
document and a cluster center (See column 8 lines 14-23); a clustering module 
designating each such candidate seed document separated from substantially all cluster 
centers with such similarity being sufficiently distinct as a seed document, and grouping 
each such candidate seed document not being sufficiently distinct into a cluster with a 
nearest cluster center (See column 9 lines 3-10). It would have been obvious to an 
artisan of ordinary skill in the pertinent art to have incorporated the teachings of Mehrle 
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into the system of Lin. The modification would have been obvious because having 
seeds allows for more efficient clustering and document retrieval. 

As for claim 29 the rejection of claim 28 is incorporated, and further Mehrle 
discloses: a plurality of candidate seed documents; a similarity module determining a 
similarity between each pair of a candidate seed document and a cluster center; a 
clustering module designating each such candidate seed document separated from 
substantially all cluster centers with such similarity being sufficiently distinct as a seed 
document, and grouping each such candidate set document not being sufficiently 
distinct into a cluster with a nearest cluster center. 

As for claim 32 the rejection of claim 29 is incorporated, and further Mehrle 
discloses: a dynamic threshold module determining a dynamic threshold for each 
cluster based on the similarities between each document in the cluster and a center of 
the cluster (See column 6 lines 30-42 ); and the similarity module identifying each 
outlier document having such a similarity outside the dynamic threshold (See column 9 
lines 3-10). 

As for claim 33 the rejection of claim 32 is incorporated, and further Mehrle 
discloses: the clustering module grouping each such outlier document into a cluster 
having a best fit, subject to a minimum fit criterion and the dynamic threshold of the 
cluster (See column 3 lines 7-16). 
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As for claim 34 the rejection of claim 32 is incorporated, and further wherein the 
dynamic threshold is determined based on the similarities of the documents in the 
cluster to the cluster center (See column 6 lines 30-37). 

Claim 35 is a method claim corresponding to system claim 18, and is thus 
rejected for the reasons set forth in the rejection of claim 18. 

Claims 41-47,49-51 are method claims corresponding to system claims 24- 
30,32-34 respectively and are thus rejected for the same reasons as set forth in the 
rejection of claims 24-30,32-34. 

Claim 52 is rejected for the same reason as claim 35. 
Claim 53 is an apparatus claim corresponding to method claim 18 and is thus 
rejected for the same reasons as claim 18. 

Claim 30 is rejected under 35 U.S.C. 103(a) as being unpatentable over Lin and 
Mehrle as applied to claim 29 above, and further in view of Klein. 

As for claim 30 the rejection of claim 29 is incorporated, and further Klein 
discloses: a normalized score vector for each document comprising the score 
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associated with the at least one concept for each such concept occurring within the 
document (See column 3 lines 18-21); and the similarity module determining the 
similarity as a function of the normalized score vector associated with the at least one 
concept for each such document (See column 18 lines 23-26). 



Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Leon J. Harper whose telephone number is 571-272- 
0759. The examiner can normally be reached on 7:30AM - 4:00Pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Hosain T. Alam can be reached on 571-272-3978. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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Leon J Harper 
February 13, 2006 




